A Web-Enabled and Speech-Enhanced Parallel Corpus of Greek-Bulgarian Cultural Texts

نویسندگان

  • Voula Giouli
  • Nikos Glaros
  • Kiril Ivanov Simov
  • Petya Osenova
چکیده

This paper reports on completed work carried out in the framework of an EU-funded project aimed at (a) developing a bilingual collection of cultural texts in Greek and Bulgarian, (b) creating a number of accompanying resources that will facilitate study of the primary texts across languages, and (c) integrating a system which aims to provide web-enabled and speech-enhanced access to digitized bilingual Cultural Heritage resources. This simple user interface, which incorporates advanced search mechanisms, also offers innovative accessibility for visually impaired Greek and Bulgarian users. The rationale behind the work (and the relative resource) was to promote the comparative study of the cultural heritage of the two countries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

INTERNATIONAL WORKSHOP MuLTILINguAL RESOuRcES, TEcHNOLOgIES ANd EvALuATION fOR cENTRAL ANd EASTERN EuROPEAN LANguAgES

This paper discusses the building of the first Bulgarian– Polish–Lithuanian (for short, BG–PL–LT) experimental corpus. The BG–PL–LT corpus (currently under development only for research) contains more than 3 million words and comprises two corpora: parallel and comparable. The BG–PL– LT parallel corpus contains more than 1 million words. A small part of the parallel corpus comprises original te...

متن کامل

Linguistic Motivation in Automatic Sentence Alignment of Parallel Corpora: the Case of Danish-Bulgarian and English-Bulgarian

We report the results from a sentencealignment experiment on DanishBulgarian and English-Bulgarian parallel texts applying a method based in part on linguistic motivations as implemented in the TCA2 aligner. Since the presence of cognates has a bearing on the alignment score of candidate sentences we attempt to bridge the gap between source and target languages by transliteration of the Bulgari...

متن کامل

How Greek the Web Is

Internet, apart from a huge repository of information of any kind, has become the main means of modern communications and World Wide Web has emerged as a new sort of society since it usually reflects almost all aspects of modern societies in terms of their economic, political and social status and structure. Therein, over wired and wireless connections, through ingenious ideas, i.e., algorithms...

متن کامل

Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...

متن کامل

The MULTEXT-East corpus

The EU MULTEXT-East project has produced harmonised language resources for Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene. In this paper we introduce the MULTEXT-East multilingual corpus, which comprises marked-up texts in the six languages totaling approximately 2 million words and a small speech corpus. The corpus is encoded in SGML, in the TEI-like Corpus Encoding Specification...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009